Introduction

Row

Arboretum Waterway

UC Davis Arboretum Putah Creek Waterway

This data anlysis dashboard was created to get a more insightful and data driven look at the water quality samples collected by the UC Davis Arboretum and Public Garden between 2016 and 2018. My goal was to make the dashboard interactive and easy for users to play with so that there can be a data driven solution to management and maintenence in the Arboretum. This project combines my passion for data analysis, solving real world problems and bringing positivity to my community. I looked at water samples from 7 different locations throughout the UC Davis Arboretum waterway that can be seen on the Water Sample Locations Map tab above. Within the seven different locations, 9 different features from the data set that measure water quality including Temperature, Electrical Conductivity, pH, Turbidity, Total Phosphorus, Total Nitrogen, Dissolved Organic Nitrogen, Dissolved Organic Carbon, and Dissolved Organic Matter. For the study, I used summary statistics, a correlation plot, exploratory analysis visual graphs and the unsupervised machine learning method of hierarchal clustering to look for patterns between the locations in terms of water quality.

About Features

Row

About Features

Below is a description of each feature that was analyzed in the study to measure water quality in the UC Davis Arboretum along the Putah Creek Waterway.

Temperature: Shows the temperature of the water. Measured in C\(^{\circ}\) (degrees Celsius).

Electrical Conductivity: The ability of water to conduct an electrical current. Measured in us/cm (microSiemens).

pH: A measure of how acidic/basic water is. The range goes from 0 to 14, with 7 being neutral. pH less than 7 indicate acidity, whereas a pH of greater than 7 indicates a base.

Dissolved Organic Nitrogen (DON): In lakes and rivers originate from photosynthetic organisms (algae and plants) and excretion of nitrogenous waste by animals, but leachate from soil, sewage discharge, and atmospheric deposition. Measured in mg/L (milligrams per liter).

Total Nitrogen (TN): The sum of total kjeldahl nitrogen (ammonia, organic and reduced nitrogen) and nitrate-nitrite. Measured in mg/L (milligrams per liter).

Total Phosphorus (TP): Phosphorus is a nutrient important for plant growth. Phosphorus originates from a variety of sources, many of which are related to human activities; major sources include human and animal wastes, soil erosion, detergents, septic systems and runoff from farmland or fertilized lawns. Measured in mg/L (milligrams per liter).

Dissolved Organic Carbon (DOC): The organic material dissolved in water. Results from decomposition of plants or animals. Once this decomposed organic material contacts water it may partially dissolve. Measured in mg/L (milligrams per liter).

Dissolved Organic Matter (DOM): consists of soluble organic materials derived from the partial decomposition of organic materials, including soil organic matter, plant residues, and soluble particles released by living organisms, including bacteria, algae, and plants. Measured in C:N ratio (carbon to nitrogen ratio).

Turbidity: the quality of being cloudy, opaque, or thick with suspended matter. Measured in ntu (nephelometric turbidity unit).

UC Davis Arboretum Map

UC Davis Arboretum Map

Below is a map of the UC Davis Arboretum waterway from which the water sample data was obtained. We can see that the locations are not ordered chronologically because for example location 7 was added to the water
sampling data set after locations 1 through 6. Therefore, keep in mind throughout the study each location ID has a very specific site that it pertains to.

Row

Water Sample Locations Map

Summary Statistics by Year

Column

Summary Statistics by Year

Summary Statistics by Location

Column

Summary Statistics by Location

Correlation Plot

Column

Correlation Plot

Interactive Graph Preface

Row

Interactive Graph Preface

I plotted scatter plots, density plots and line plots of the data. There is a description of the interactive capabilities of each graph type below, please read the descriptions before looking at the graphs to make the most of your experience using the visualization tools.

Scatter Plots:
The scatter plots are the features (such as Temperature, Electrical Conductivity, etc..) plotted over time to see how these features have changed in the Arboretum between 2016 and 2018. With this in mind, there will be color coded points that correspond to a location. For example, yellow corresponds to the Water Outlet location in the scatterplots, therefore every time you see a yellow point, that is the value of the feature for the Water Outlet at a specific time. You can hover over the points to see the exact values such as the location it belongs to, the value of the feature and the date the point was recorded. The graphs are interactive so if you double click the dots on the legend, such as the yellow dot the graph will make it so that you only look at the points for Water Outlet which corresponds to the yellow dots. To get out of this view double click on the graph again. You can also zoom into the graph to look at specific points by right clicking with the cursor on the graph and creating a window for the area you want to look at. Again, to get out of this view double click on the graph. There are icons on the top right part of the scatter plot that you can read and play around with to zoom into the graph or even download the plot as a png file.

Density Plots:
The density plots are the features (such as Temperature, Electrical Conductivity, etc..) plotted by location to see the different density of each by the location name. You can hover over the points to see the exact values such as the density and the temperature value it belongs to. You can also zoom into the graph to look at specific densities by right clicking with the cursor on the graph and creating a window for the area you want to look at. To get out of this view double click on the graph. There are icons on the top right part of the scatter plot that you can read and play around with to zoom into the graph or even download the plot as a png file.

Line Plots (Median Values):
The line plots for the median value of each location ID was used because overall, the data is skewed and in order to minimize this skewness I used the median. With this in mind, there are two lines on this plot. The orange line which is the median of the data is the overall median of the seven locations combined and the blue line is the median of each location for the selected feature. You can hover over the lines at each location ID to see the exact median values of each feature. The graphs are interactive so if you double click the dots on the legend, such as the blue line the graph will make it so that you only look at the blue line which is the line for the median at each individual location. To get out of this view double click on the graph again. You can also zoom into the graph to look at specific points by right clicking with the cursor on the graph and creating a window for the area you want to look at. Again, to get out of this view double click on the graph. There are icons on the top right part of the scatter plot that you can read and play around with to zoom into the graph or even download the plot as a png file.

Temperature

Row

Temperature: Scatter Plot

Row

Temperature: Density Plot

Temperature: Line Plot (Median Values)

Electrical Conductivity

Row

Electrical Conductivity: Scatter Plot

Row

Electrical Conductivity: Density Plot

Electrical Conductivity: Median Values

pH

Row

pH: Scatter Plot

Row

pH: Density Plot

pH: Median Values

Turbidity

Row

Turbidity: Scatter Plot

Row

Turbidity: Density Plot

Turbidity: Median Values

Total Phosphorus

Row

Total Phosphorus: Scatter Plot

Row

Total Phosphorus: Density Plot

Total Phosphorus: Median Values

Total Nitrogen

Row

Total Nitrogen: Scatter Plot

Row

Total Nitrogen: Density Plot

Total Nitrogen: Median Values

Dissolved Organic Nitrogen

Row

Dissolved Organic Nitrogen: Scatter Plot

Row

Dissolved Organic Nitrogen: Density Plot

Dissolved Organic Nitrogen: Median Values

Dissolved Organic Carbon

Row

Dissolved Organic Carbon: Scatter Plot

Row

Dissolved Organic Carbon: Density Plot

Dissolved Organic Carbon: Median Values

Dissolved Organic Matter

Row

Dissolved Organic Matter: Scatter Plot

Row

Dissolved Organic Matter: Density Plot

Dissolved Organic Matter: Median Values

Hierarchal Clustering Preface

Row

Hierarchal Clustering Preface

The following graphs are circular dendrograms and heatmaps using hierarchal clustering and more specific using Ward’s Method (ward.D). For the circular dendrograms and heatmaps, there are four plots including a circular dendrogram and heatmap for all the years (between 2016 and 2018) to see a more aggregated picture and a circular dendrogram and heatmap by year to see the yearly pattern change. With this in mind, it is very interesting to compare and contrast the changes over the years between the graphs. I decided to go ahead and use these graphs because as the quote goes “A picture is worth a thousand words” and a lot of insights can be gathered from the pictures. I believe a more visual depiction is the best way to show this data because it makes it readable for anyone since people can look at pictures easily and extract information, you don’t have to be very technically gifted to look at a picture and tell people what you see.

Reading a Circular Dendrogram: To read a circular dendrogram you want to look at the tree each part of the circle corresponds to. For example, if the first part of the tree you want to look at is Location 1 then you would look at the subset of the tree with the color red which corresponds to Location 1. Now that you are looking at that subset, you would then see what colors are in that part of the dendrogram and say there are similarities between Location 1 and the corresponding colors and therefore locations within that tree.

Reading a Heatmap To read the heatmaps, you want to see where there are color patterns between each variable. For example, if looking at the variable Temperature, we want to see similar shades of colors between variables and compare and contrast to see how those variables are similar within the given time span.

These graphs are all very visual and require just a creative mind to pick up patterns and be able to compare why there are certain patterns within the clustering. However, this clustering was done using purely mathematics through the use of the Euclidean Distance Function, and then Hierarchal Clustering using Ward’s Method, therefore these patterns are found using mathematics. This is a mathematical way to represent the relationship between the data through the years at the UC Davis Arboretum.

Circular Dendrograms

Column

Circular Dendrogram All Years 2016 to 2018

Circular Dendrogram 2016

Circular Dendrogram 2017

Circular Dendrogram 2018

Circular Dendrogram Side by Side Comparison

Heatmaps

Row

Heatmap All Years 2016 to 2018

Heatmap 2016

Heatmap 2017

Heatmap 2018

Heatmaps Side by Side Comparison